Search

Summary

Abstract

We analyse online gradient descent learning from finite training sets at non-infinitesimal learning rates η for both linear and non-linear networks. In the linear case, exact results are obtained for the time-dependent generalization error of networks with a large number of weights N, trained on p = αN examples. This allows us to study in detail the effects of finite training set size α on, for example, the optimal choice of learning rate η. We also compare online and offline learning, for respective optimal settings of η at given final learning time. Online learning turns out to be much more robust to input bias and actually outperforms offline learning when such bias is present; for unbiased inputs, online and offline learning perform almost equally well. Our analysis of online learning for non-linear networks (namely, soft-committee machines), advances the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.

Introduction

The analysis of online (gradient descent) learning, which is one of the most common approaches to supervised learning found in the neural networks community, has recently been the focus of much attention. The characteristic feature of online learning is that the weights of a network (‘student’) are updated each time a new training example is presented, such that the error on this example is reduced.

Search Results

Refine search

Refine search

Actions for selected content:

1 results

13 - On-line Learning from Finite Training Sets

Summary

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

1 results

13 - On-line Learning from Finite Training Sets

Summary